AITopics | Sydney

Collaborating Authors

Sydney

Video Frame Interpolation without Temporal Priors Chaoyue Wang The University of Sydney, Australia

Neural Information Processing SystemsMar-19-2025, 23:03:21 GMT

Video frame interpolation, which aims to synthesize non-exist intermediate frames in a video sequence, is an important research topic in computer vision. Existing video frame interpolation methods have achieved remarkable results under specific assumptions, such as instant or known exposure time. However, in complicated realworld situations, the temporal priors of videos, i.e., frames per second (FPS) and frame exposure time, may vary from different camera sensors. When test videos are taken under different exposure settings from training ones, the interpolated frames will suffer significant misalignment problems. In this work, we solve the video frame interpolation problem in a general situation, where input frames can be acquired under uncertain exposure (and interval) time. Unlike previous methods that can only be applied to a specific temporal prior, we derive a general curvilinear motion trajectory formula from four consecutive sharp frames or two consecutive blurry frames without temporal priors. Moreover, utilizing constraints within adjacent motion trajectories, we devise a novel optical flow refinement strategy for better interpolation results. Finally, experiments demonstrate that one well-trained model is enough for synthesizing high-quality slow-motion videos under complicated real-world situations. Codes are available on https://github.

artificial intelligence, interpolation, machine learning, (14 more...)

Neural Information Processing Systems

Country: Oceania > Australia > New South Wales > Sydney (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Machine learning for triage of strokes with large vessel occlusion using photoplethysmography biomarkers

Goda, Márton Á., Badge, Helen, Khan, Jasmeen, Solewicz, Yosef, Davoodi, Moran, Teramayi, Rumbidzai, Cordato, Dennis, Lin, Longting, Christie, Lauren, Blair, Christopher, Sharma, Gagan, Parsons, Mark, Behar, Joachim A.

arXiv.org Artificial IntelligenceMar-9-2025

Objective. Large vessel occlusion (LVO) stroke presents a major challenge in clinical practice due to the potential for poor outcomes with delayed treatment. Treatment for LVO involves highly specialized care, in particular endovascular thrombectomy, and is available only at certain hospitals. Therefore, prehospital identification of LVO by emergency ambulance services, can be critical for triaging LVO stroke patients directly to a hospital with access to endovascular therapy. Clinical scores exist to help distinguish LVO from less severe strokes, but they are based on a series of examinations that can take minutes and may be impractical for patients with dementia or those who cannot follow commands due to their stroke. There is a need for a fast and reliable method to aid in the early identification of LVO. In this study, our objective was to assess the feasibility of using 30-second photoplethysmography (PPG) recording to assist in recognizing LVO stroke. Method. A total of 88 patients, including 25 with LVO, 27 with stroke mimic (SM), and 36 non-LVO stroke patients (NL), were recorded at the Liverpool Hospital emergency department in Sydney, Australia. Demographics (age, sex), as well as morphological features and beating rate variability measures, were extracted from the PPG. A binary classification approach was employed to differentiate between LVO stroke and NL+SM (NL.SM). A 2:1 train-test split was stratified and repeated randomly across 100 iterations. Results. The best model achieved a median test set area under the receiver operating characteristic curve (AUROC) of 0.77 (0.71--0.82). \textit{Conclusion.} Our study demonstrates the potential of utilizing a 30-second PPG recording for identifying LVO stroke.

artificial intelligence, machine learning, vessel occlusion, (16 more...)

arXiv.org Artificial Intelligence

2503.13486

Country:

Oceania > Australia > New South Wales > Sydney (0.24)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Therapeutic Area > Neurology > Dementia (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

MoCFL: Mobile Cluster Federated Learning Framework for Highly Dynamic Network

Fang, Kai, Deng, Jiangtao, Dong, Chengzu, Naseem, Usman, Liu, Tongcun, Feng, Hailin, Wang, Wei

arXiv.org Artificial IntelligenceMar-3-2025

Frequent fluctuations of client nodes in highly dynamic mobile clusters can lead to significant changes in feature space distribution and data drift, posing substantial challenges to the robustness of existing federated learning (FL) strategies. To address these issues, we proposed a mobile cluster federated learning framework (MoCFL). MoCFL enhances feature aggregation by introducing an affinity matrix that quantifies the similarity between local feature extractors from different clients, addressing dynamic data distribution changes caused by frequent client churn and topology changes. Additionally, MoCFL integrates historical and current feature information when training the global classifier, effectively mitigating the catastrophic forgetting problem frequently encountered in mobile scenarios. This synergistic combination ensures that MoCFL maintains high performance and stability in dynamically changing mobile environments. Experimental results on the UNSW-NB15 dataset show that MoCFL excels in dynamic environments, demonstrating superior robustness and accuracy while maintaining reasonable training costs.

accuracy, artificial intelligence, machine learning, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3696410.3714515

2503.01557

Country:

Asia > China > Zhejiang Province (0.29)
Oceania > Australia > New South Wales > Sydney (0.15)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

SCU: An Efficient Machine Unlearning Scheme for Deep Learning Enabled Semantic Communications

Wang, Weiqi, Tian, Zhiyi, Zhang, Chenhan, Yu, Shui

arXiv.org Artificial IntelligenceFeb-27-2025

--Deep learning (DL) enabled semantic communications leverage DL to train encoders and decoders (codecs) to extract and recover semantic information. However, most semantic training datasets contain personal private information. Such concerns call for enormous requirements for specified data erasure from semantic codecs when previous users hope to move their data from the semantic system. Existing machine unlearning solutions remove data contribution from trained models, yet usually in supervised sole model scenarios. These methods are infeasible in semantic communications that often need to jointly train unsupervised encoders and decoders. In this paper, we investigate the unlearning problem in DL-enabled semantic communications and propose a semantic communication unlearning (SCU) scheme to tackle the problem. SCU includes two key components. Firstly, we customize the joint unlearning method for semantic codecs, including the encoder and decoder, by minimizing mutual information between the learned semantic representation and the erased samples. Secondly, to compensate for semantic model utility degradation caused by unlearning, we propose a contrastive compensation method, which considers the erased data as the negative samples and the remaining data as the positive samples to retrain the unlearned semantic models con-trastively. Theoretical analysis and extensive experimental results on three representative datasets demonstrate the effectiveness and efficiency of our proposed methods. EMANTIC communication has attracted significant attention recently. It is regarded as a significant advancement beyond the Shannon paradigm, as semantic communication focuses on transmitting the underlying semantic information from the source, rather than ensuring the accurate reception of each individual symbol or bit irrespective of its meaning [1, 2]. With the burgeoning advancement of deep learning (DL), researchers found that employing DL models as the encoder and decoder greatly improves semantic transmission efficiency and reliability [3, 4], called DL-enabled semantic communications. However, to train these DL semantic encoders and decoders, transmitters and receivers must first collect the training datasets from huge amounts of human activities from users [1], which contain rich personal privacy information. This paper was supported in part by Australia ARC LP220100453, ARC DP200101374, and ARC DP240100955. W . Wang, Z. Tian and S. Y u are with the School of Computer Science, University of Technology Sydney, Australia. In healthcare scenarios, the server needs to collect users' sensitive information, such as blood pressure, heart rate, etc, for SC model training. Users also benefit from the downstream applications when the SC models are well-trained.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.19785

Country:

Asia (0.68)
Oceania > Australia > New South Wales > Sydney (0.24)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multiview graph dual-attention deep learning and contrastive learning for multi-criteria recommender systems

Forouzandeh, Saman, Krivitsky, Pavel N., Chandra, Rohitash

arXiv.org Machine LearningFeb-26-2025

Recommender systems leveraging deep learning models have been crucial for assisting users in selecting items aligned with their preferences and interests. However, a significant challenge persists in single-criteria recommender systems, which often overlook the diverse attributes of items that have been addressed by Multi-Criteria Recommender Systems (MCRS). Shared embedding vector for multi-criteria item ratings but have struggled to capture the nuanced relationships between users and items based on specific criteria. In this study, we present a novel representation for Multi-Criteria Recommender Systems (MCRS) based on a multi-edge bipartite graph, where each edge represents one criterion rating of items by users, and Multiview Dual Graph Attention Networks (MDGAT). Employing MDGAT is beneficial and important for adequately considering all relations between users and items, given the presence of both local (criterion-based) and global (multi-criteria) relations. Additionally, we define anchor points in each view based on similarity and employ local and global contrastive learning to distinguish between positive and negative samples across each view and the entire graph. We evaluate our method on two real-world datasets and assess its performance based on item rating predictions. The results demonstrate that our method achieves higher accuracy compared to the baseline method for predicting item ratings on the same datasets. MDGAT effectively capture the local and global impact of neighbours and the similarity between nodes.

artificial intelligence, machine learning, recommendation, (17 more...)

arXiv.org Machine Learning

2502.19271

Country: Oceania > Australia > New South Wales > Sydney (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Random Forest-of-Thoughts: Uncertainty-aware Reasoning for Computational Social Science

Wu, Xiaohua, Tao, Xiaohui, Wu, Wenjie, Li, Yuefeng, Li, Lin

arXiv.org Artificial IntelligenceFeb-25-2025

Social surveys in computational social science are well-designed by elaborate domain theories that can effectively reflect the interviewee's deep thoughts without concealing their true feelings. The candidate questionnaire options highly depend on the interviewee's previous answer, which results in the complexity of social survey analysis, the time, and the expertise required. The ability of large language models (LLMs) to perform complex reasoning is well-enhanced by prompting learning such as Chain-of-thought (CoT) but still confined to left-to-right decision-making processes or limited paths during inference. This means they can fall short in problems that require exploration and uncertainty searching. In response, a novel large language model prompting method, called Random Forest of Thoughts (RFoT), is proposed for generating uncertainty reasoning to fit the area of computational social science. The RFoT allows LLMs to perform deliberate decision-making by generating diverse thought space and randomly selecting the sub-thoughts to build the forest of thoughts. It can extend the exploration and prediction of overall performance, benefiting from the extensive research space of response. The method is applied to optimize computational social science analysis on two datasets covering a spectrum of social survey analysis problems. Our experiments show that RFoT significantly enhances language models' abilities on two novel social survey analysis problems requiring non-trivial reasoning.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.18729

Country:

North America > United States (0.47)
Asia > China (0.28)
Europe > Austria > Vienna (0.14)
Oceania > Australia > New South Wales > Sydney (0.14)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detecting Linguistic Bias in Government Documents Using Large language Models

de Swart, Milena, Hengst, Floris den, Chen, Jieying

arXiv.org Artificial IntelligenceFeb-19-2025

This paper addresses the critical need for detecting bias in government documents, an underexplored area with significant implications for governance. Existing methodologies often overlook the unique context and far-reaching impacts of governmental documents, potentially obscuring embedded biases that shape public policy and citizen-government interactions. To bridge this gap, we introduce the Dutch Government Data for Bias Detection (DGDB), a dataset sourced from the Dutch House of Representatives and annotated for bias by experts. We fine-tune several BERT-based models on this dataset and compare their performance with that of generative language models. Additionally, we conduct a comprehensive error analysis that includes explanations of the models' predictions. Our findings demonstrate that fine-tuned models achieve strong performance and significantly outperform generative language models, indicating the effectiveness of DGDB for bias detection. This work underscores the importance of labeled datasets for bias detection in various languages and contributes to more equitable governance practices.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3696410.3714526

2502.13548

Country:

Europe > Netherlands (0.89)
Oceania > Australia > New South Wales > Sydney (0.15)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government > Europe Government > Netherlands Government (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.83)

Add feedback

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation School of Computer Science, The University of Sydney, Australia

Neural Information Processing SystemsFeb-10-2025, 22:01:59 GMT

Although no specific domain knowledge is considered in the design, plain vision transformers have shown excellent performance in visual recognition tasks. However, little effort has been made to reveal the potential of such simple structures for pose estimation tasks. In this paper, we show the surprisingly good capabilities of plain vision transformers for pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model called ViTPose. Specifically, ViTPose employs plain and non-hierarchical vision transformers as backbones to extract features for a given person instance and a lightweight decoder for pose estimation. It can be scaled up from 100M to 1B parameters by taking the advantages of the scalable model capacity and high parallelism of transformers, setting a new Pareto front between throughput and performance. Besides, ViTPose is very flexible regarding the attention type, input resolution, pre-training and finetuning strategy, as well as dealing with multiple pose tasks. We also empirically demonstrate that the knowledge of large ViTPose models can be easily transferred to small ones via a simple knowledge token. Experimental results show that our basic ViTPose model outperforms representative methods on the challenging MS COCO Keypoint Detection benchmark, while the largest model sets a new state-of-the-art, i.e., 80.9 AP on the MS COCO test-dev set.

artificial intelligence, machine learning, vitpose, (17 more...)

Neural Information Processing Systems

Country: Oceania > Australia > New South Wales > Sydney (0.40)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Before It's Too Late: A State Space Model for the Early Prediction of Misinformation and Disinformation Engagement

Tian, Lin, Booth, Emily, Bailo, Francesco, Droogan, Julian, Rizoiu, Marian-Andrei

arXiv.org Artificial IntelligenceFeb-6-2025

In today's digital age, conspiracies and information campaigns can emerge rapidly and erode social and democratic cohesion. While recent deep learning approaches have made progress in modeling engagement through language and propagation models, they struggle with irregularly sampled data and early trajectory assessment. We present IC-Mamba, a novel state space model that forecasts social media engagement by modeling interval-censored data with integrated temporal embeddings. Our model excels at predicting engagement patterns within the crucial first 15-30 minutes of posting (RMSE 0.118-0.143), enabling rapid assessment of content reach. By incorporating interval-censored modeling into the state space framework, IC-Mamba captures fine-grained temporal dynamics of engagement growth, achieving a 4.72% improvement over state-of-the-art across multiple engagement metrics (likes, shares, comments, and emojis). Our experiments demonstrate IC-Mamba's effectiveness in forecasting both post-level dynamics and broader narrative patterns (F1 0.508-0.751 for narrative-level predictions). The model maintains strong predictive performance across extended time horizons, successfully forecasting opinion-level engagement up to 28 days ahead using observation windows of 3-10 days. These capabilities enable earlier identification of potentially problematic content, providing crucial lead time for designing and implementing countermeasures. Code is available at: https://github.com/ltian678/ic-mamba. An interactive dashboard demonstrating our results is available at: https://ic-mamba.behavioral-ds.science.

machine learning, natural language, prediction, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3696410.3714527

2502.04655

Country:

North America > United States (1.00)
Europe (0.67)
Oceania > Australia > New South Wales > Sydney (0.15)

Genre: Research Report > New Finding (0.66)

Industry:

Media > News (1.00)
Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Behavioral Homophily in Social Media via Inverse Reinforcement Learning: A Reddit Case Study

Yuan, Lanqin, Schneider, Philipp J., Rizoiu, Marian-Andrei

arXiv.org Artificial IntelligenceFeb-5-2025

Online communities play a critical role in shaping societal discourse and influencing collective behavior in the real world. The tendency for people to connect with others who share similar characteristics and views, known as homophily, plays a key role in the formation of echo chambers which further amplify polarization and division. Existing works examining homophily in online communities traditionally infer it using content- or adjacency-based approaches, such as constructing explicit interaction networks or performing topic analysis. These methods fall short for platforms where interaction networks cannot be easily constructed and fail to capture the complex nature of user interactions across the platform. This work introduces a novel approach for quantifying user homophily. We first use an Inverse Reinforcement Learning (IRL) framework to infer users' policies, then use these policies as a measure of behavioral homophily. We apply our method to Reddit, conducting a case study across 5.9 million interactions over six years, demonstrating how this approach uncovers distinct behavioral patterns and user roles that vary across different communities. We further validate our behavioral homophily measure against traditional content-based homophily, offering a powerful method for analyzing social media dynamics and their broader societal implications. We find, among others, that users can behave very similarly (high behavioral homophily) when discussing entirely different topics like soccer vs e-sports (low topical homophily), and that there is an entire class of users on Reddit whose purpose seems to be to disagree with others.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3696410.3714618

2502.02943

Country: